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EDF  Tests  for  Normality  in  Linear  Models 
after  a  Box- Cox  Transformation 


Gemai  Chen,  Richard  Lockhart  and  Michael  A.  Stephens 
Simon  Fraser  Univereity,  B.C.  Canada 


Summary 


The  BoX'Cox  transformation  procedure  has  been  used  extensively  in  data  analysis,  for  exanti' 
pie  in  regression,  where  the  response  variable  is  subjected  to  a  suitable  power  transformation 
so  that  the  standard  normal*theory  linear  regression  models  can  be  fitted  to  the  transformed 
values.  In  this  paper,  distribution  theory  is  developed  for  a  family  of  EDF  statistics,  includ¬ 
ing  the  Anderson-Darling  statistic  A?  and  the  Cramer-von  Mises  statistic  so  that  these 
statistics  can  be  used  to  test  for  normality  in  the  linear  model  after  applying  the  Box-Cox 
transformation.  A  table  of  as3rmptotic  critical  points  is  given  for  .4^  and  W^,  and  numerical 
examples  are  given  to  illustrate  the  use  of  the  table. 


Keywords:  LINEAR  REGRESSION,  NON-LINEAR  REGRESSION,  MAXIMUM  LIKELI¬ 
HOOD  ESTIMATION,  TRANSFORMATIONS  TO  NORMALITY 


1  Introduction 


The  Box-Cox  transformation  procedure  has  been  used  extensively  in  regression  amalysis, 
in  which  the  response  variable  is  subjected  to  a  suitable  power  transformation  so  that  the 
standard  normal- theory  linear  regression  model  can  be  fitted  to  the  transformed  responses. 

Let  yi, . . . ,  be  positive  independent  random  variables  denoting  responses  to  variables  X. 
For  a  real  number  A,  the  Box-Cox  power  transformation  family  is 

The  Box-Cox  transformation  is  used  to  find  a  suitable  A  so  that,  after  transformation,  the 
following  linear  model  is  approximately  applicable, 

r(A)«Ar/?  +  <re,  (1.2) 

where  X  ~  (xij)  is  a  known  n  x  p  matrix  of  constants,  unknown  re¬ 
gression  parameters  (a  column  vector),  is  an  unknown  positive  constant,  e  s  (ci . Cn)' 

are  independent  and  identically  distributed  standard  normal  random  variables,  and  K(A)  = 
(yi(A),...,y^(A))‘;  superscript  t  denotes  transpose. 

An  important  part  of  this  model  is  the  assumption  that  the  c,-  are  i.  i.  d.  jNr(0, 1),  and 
in  this  paper  we  propose  tests  for  this  assumption.  The  tests  will  be  based  on  the  empirical 
distribution  function  (EDF)  of  the  estimated  residuals,  and  we  give  tables  for  the  Cramer- 
von  Mises  statistic  and  the  Anderson- Darling  statistic  A^. 

We  be^n  by  making  two  important  comments.  Firstly,  a  test  for  normality  after  regression, 
without  the  Box-Cox  transformation,  and  using  the  estimated  residuals,  is  known  to  be  the 
same  asymptotically  as  a  one-sample  test  for  which  the  mean  and  variance  must  be  estimated; 
see  Stephens  (1986,  Section  4.8.5)  for  the  procedure.  Here  the  situation  is  different,  essentially 
because  A  must  also  be  estimated,  and  one  might  expect  that  the  tables  of  percentage  points 
for  the  test  will  depend,  even  asymptotically,  on  the  true  values  of  the  various  parameters. 
However,  we  show  that  the  problem  can  be  reduced  so  that,  in  most  practical  circumstances. 


HA)  = 


log  Y{  if  A  =  0. 


(1.1) 


the  tables  depend  on  only  one  new  parameter  g,  to  be  given  in  Section  2;  ^  is  a  function  of  the 
estimated  regression  parameters  and  of  or,  the  estimate  of  <7. 


Secondly,  we  observe  that  in  model  (1-2),  the  parameters  are  usually  estimated  as  though 
there  were  no  restraint  on  the  y’(A).  However,  (1.2)  cannot  be  precisely  correct  except  when 
A  =  0,  since  the  right  hand  side  can  take  on  any  value,  but  the  left  hand  side  must  be  greater 
than  ~1/A.  In  practice,  this  restriction  will  make  very  little  dilTerence  to  the  algebra  of  fitting 
model  (1.2);  we  discuss  this  issue  in  Section  2. 

Nevertheless,  for  the  purpose  of  developing  a  sound  theory  for  the  tests  of  fit,  it  is  necessary 
to  use  the  correct  density  and  distribution  of  1^,1  =  1,2, ...,n.  The  density  is  the  following 
truncated  normal  distribution  (see,  for  example,  Poirier,  1978) 


/(y.;A,/ii,a^)  =  { 


[  <^-V  [{(»?  -  1)/A  -  yt-'/m),  it  A  >  0, 

{(logyi  -  /i,)/<T}  if  A  =  0, 

i  -'■V  [{(»f  -  1)/A  -  h)!-’]  if  A  <  0. 


(1.3) 


In  (1.3),  yi  >  0;  in  addition,  ^(•)  and  $(•)  are  the  density  and  distribution  function  of  a  standard 
normal  random  variable,  m  =  x\P,  with  x\  denoting  the  row  of  X,  and  Si  =  (/i,-  +  l/A)/or. 
Note  that  in  model  (1.3)  BiyC)  ^  /r,-,  so  that  model  (1.3)  is  a  non-linear  model  in  p. 


The  plan  of  this  paper  is  the  following.  Section  2  discusses  parameter  estimation  in  model 
(1.3)  and  presents  a  family  of  EDF  statistics  for  testing  the  fit  of  the  model.  In  particular,  we 
consider  the  effect  of  estimating  A,  and  introduce  the  parameter  g  which  is  the  argument  of 
the  tables  used  with  the  tests.  Section  3  illustrates  the  use  of  the  EDF  tests  with  real  data. 
Section  4  f!«nUiim  the  theory  for  the  EDF  tests  introduced  in  section  2,  and  in  Section  5  the 
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2  Parameter  Estimation  and  EDF  Tests  of  Fit 


2.1  Parameter  Estimation 

We  first  introduce  i/  for  and  then  use  maximum  likelihood  to  estimate  parameters  A,  ^ 
and  V.  Denote  by  L  the  log-likelihood  function  of  a  random  sample  Ki, . . . ,  based  on  model 
(1.3);  then  except  for  a  constant, 

-(n/2)  log  »  -  (2^)-‘  ZUM  -  1)/A  -  ft)’ 

+(A  - 1)  £?.,  log  Vi  -  Ei,  log  if  A  >  0, 

-(n/2)  logi/  -  (2i/)-*  E/., (login  -  ft}’ 

-EiUilogft.  ifA  =  0, 

-(n/2)  log  1/  -  (21/)-'  Er.i{(y?  -  1)/A  -  ft}’ 

+(A  -  1) ESrt  login  -  E“.1  log «(-<,).  it  A  <  0. 

Because  y{(A)  defined  by  (1.1)  is  differentiable  with  respect  to  A,  L  is  differentiable  with  respect 
to  A,  /?  and  V.  Thus,  for  A  >  0  the  likelihood  equations  axe 

or  n  It 

^  =  >'-'D(»?-l)/A-ft)xi»-2W,)/4(ft)l(Wv^  =  0,  t  =  l . .  (2.2) 

or  It  n 

1=  -  -n/(2i/)  +  (2./’)-'  Oft-  -  1)/A  -  ftl’  +  EW*)/«(*)I1<./(2>')1  =  ».  (2-3) 

^  =  -(A’K)-'21(y/-l)/A-ftl(A|(/log|n-|i'  +  l) 

+  D«3.)/*(<i)]|l/(A*')|  +  E  log  B  =  0.  (2.4) 

tal  isl 

Similar  likelihood  equations  can  be  found  for  A  <  0  and  A  s=  0. 

It  doe*  not  seem  possible  to  find  closed-form  maximum  likelihood  estimators  for  fi,  v  and 
A  from  the  likelihood  function  directly,  or  from  the  likelihood  equations.  Therefore,  iterative 
numerical  methods  are  necessary. 

The  usual  Box-Cox  approach  to  parameter  estimation  gives  a  log-likelihood  function  which 
u  related  to  L  above.  We  denote  by  lac  the  log-likelihood  function  of  model  (1.2)  discussed 
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by  Box  and  Cox  (1964).  Then,  except  for  a  constant, 

Ibc  =  -(n/2)  log  u  -  -  1)/A  -  /i.}^  +  (A  -  1)  logy,-.  (2.5) 

i=l  izzl 

The  Box-Cox  method  maximizes  Ibc  over  /3  and  i/  with  A  kept  fixed;  the  result  is  a  function 
of  A  alone  that  is  given  by 

/bc(A)  =  -(n/2)  log  t/flc(A)  -  (n/2)  +  (A  -  1)  ^  logy;,  (2.6) 

t=i 

where  nvsc(A)  =  y’(A)‘(/  —  X(A’‘.X^)~^Jf‘)y(A)  is  the  residual  sum  of  squares  from  regressing 
y(A)  on  X.  The  final  value  A  of  A  for  subsequent  analysis  is  determined  by  maximizing  IbcW 
over  A,  and  the  estimates  for  and  i/  are  then  given  by 

0  =  (X‘X)-*JJf‘y(A),  (2.7) 

U  =  iy(A)(/-A-(X‘X)-'A’‘)y(A).  (2.8) 

Note  that  the  residual  sum  of  squares  is  usually  divided  by  n  —  p  to  obtain  an  estimate  of  t^, 
where  p  is  the  number  of  regression  parameters.  The  examples  given  in  the  next  section  will 
follow  this  convention. 

It  is  useful  to  compare  the  results  obtained  by  using  Ibc  rather  than  L  to  estimate  the 
parameters.  It  can  be  seen  by  comparing  (2.5)  to  (2.1)  that  Ibc  ^  LU  log$(5,)  w  0, 
if  A  >  0,  or  -  2:?=!  log  ^{-Si)  «  0,  if  A  <  0.  This  happens  if  (1)  A  is  close  to  zero,  or  (2)  p,-'s 
(or  -'Pi’s)  are  large,  or  (3)  u  is  small.  In  practice,  one  or  more  of  these  conditions  often  holds. 
Consider,  for  example,  the  case  A  >  0.  Suppose  5+  =  mini<i<„{5i};  if  >  $“*(c"'/’‘)  for  a 
positive  constant  c,  then 

-J^log^C^i)  <  -nlog$(tf;J’)  <  c. 

ial 

For  example,  suppose  c  =  0.01  and  n  =  K);  then  =  3.54.  Thus,  if  the 

minimum  Si  ^  min(pi  +  lfX)lcr  has  a  value  3.54,  then  truncating  the  left  tail  as  in  (1.3)  cuts 
off  less  than  0.01  from  the  log-likelihood.  A  similar  result  holds  for  the  case  when  A  <  0.  It 
often  happens  that  —  23?«i  log  ^(^0  (or,  for  A  <  0,  —  log  ^{—Si))  is  small,  so  that  Ibc  «  L, 
and  then  use  of  Igc  fr>  estimate  parameters  from  a  given  set  of  data  will  yield  almost  the  same 
results  as  use  of  the  likelihood  L. 
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2.2  EDF  Tests  of  Fit 


To  test  for  goodness-of-fit  when  fitting  model  (1.3)  to  data,  we  use  the  well-known  Cramer- 
von  Mises  family  of  EDF  statistics.  For  the  present  case,  let  7  =  (A,  i/)‘  and  let  7  =  (A,  y3‘,  v)* 

be  the  maximum  likelihood  estimate  for  7.  The  cumulative  distribution  function  for  Yi  in  model 
(1.3)  is  given  by 

’  my:)  +  ^Si)-l}/^6i),  ifA>0, 

^•(y.;7)  =  *  ${(logy,-  -  /i.)/v^,  if  A  =  0,  (2.9) 

[  ^(y; )/♦(-«.).  if  A  <  0, 

where  m  -  Si  =  (/z.+l/A)/v^,  y-  =  {iyi-l)l\-tii)ly/u.  Now  for  each  i,  let  w;  =  Fi(y.-;  7) 
and  let  the  empirical  distribution  function  of  the  v,*’s  be 

A.(l)  =  i  i;  l[vi  <  1],  (0  <  I  <  1),  (2.10) 

^  ial 

where  1[A]  3=  1  if  >4  is  true,  otherwise,  IfA]  —  0.  The  EDF  statistics  are  based  on  the 
discrepancies  between  Fn(0  and  F{t)  =  t  (0  <  t  <  1),  namely, 

=  (2-H) 

Jo 


where  ^(i)  >  0  is  a  suitable  known  weight  function.  As  special  cases,  the  Cramer-von  Mises 
statistic  is  obtained  when  ^(t)  =  1,  and  the  Anderson-Darling  statistic  is  obtained  when 


^(t)  ss  {t(l  —  t))"*.  Let  W(i)  <  U(3)  <  • '  •  <  W(n)  be  the  order  statistics  of  the  v,-.  Statistics 
and  are  then  given  by 

Af  2i-l'lV  1  ....... 


A*  s=  -n  -  -  S{(2t  -  1)  log  v(i)  -I-  (2n  +  1  -  i)  log(l  -  U(,))}. 


(2.12) 


(2.13) 


2.3  The  Goodness*of-Fit  Test  Procedure 


Suppose  that  the  model  matrix  X  is  of  the  form  X  =  (U  V)  where  U  is  an  n  by  1  vector 
of  l*s.  Center  matrix  V  into  U  by  subtracting  from  each  column  the  mean  of  that  column.  Let 
u{  denote  the  tow  of  U.  Denote  the  regression  parameters  fihy  fi  —  {^1,...^  P,y  s  (A>  ^‘)S 
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where  =  {02^03,  •  •  •  then  when  no  covariates  are  involved,  0  =  0i  =  say,  where  ft  is 
the  grand  mean. 

To  perform  a  goodness-of-ht  test  of  Hq:  model  (1.3)  fits  the  data,  the  following  steps  are 
taken: 

(a)  Find  A,  0  and  v  as  described  above, 

(b)  Compute  v,-  =  /i(yi;7)  according  to  (2.9)  with  the  true  parameters  replaced  by  their 

estimates.  In  practice,  the  use  of  u,-  =  $(y*),  where  y*  =  ({(y,^  —  1)/A}  —  /I,)/d-,  a  =  \/? 
will  almost  always  give  the  same  test  result. 

(c)  Put  the  Vi  in  ascending  order,  and  calculate  or  according  to  (2.12)  or  (2.13), 

respectively, 

(d)  Find  rji  =  u|d/v^,  and  obtain  a  quantity  g  defined  by 

j  =  6  +  if:{8,?  +  ,?}-  (2.14) 

"  isl  \”  ial  /  ” 

where  Un  =  rjfu\. 

(e)  Enter  Tabic  1  with  the  vdue  of  1/y,  and  reject  Ha  at  significance  level  a  if  the  test  statistic 

exceeds  the  corresponding  upper  a-percentile  given  in  Table  1. 

The  entries  in  Table  1  are  the  upper  percentiles,  for  the  appropriate  1/y,  of  the  asymptotic 
distributions  of  and  respectively,  as  n  oo,  and  as  |(^i  +  l/A)/cr|  -*  oo.  Note  that  the 
upper  percentiles  corresponding  to  1/y  =  0  are  the  upper  percentiles  for  testing  goodness-of- 
fit  of  linear  models  viithovi  taking  any  Box-Cox  transformation  (see  Stephens  (1986),  Section 
4.8.5).  The  percentiles  for  1/y  differ  increasingly  from  these  values  as  1/y  grows  larger.  When 
y  as  6  the  situation  corresponds  to  the  case  where  there  is  no  regression,  and  the  y,-(A)  are 
simply  a  transformed  random  sample. 

In  principle,  the  tabulated  distributions  could  be  expected  to  depend  on  the  various  values 
of  the  parameters  0  and  A,  but  in  fact  the  effect  of  estimating  these  parameters  is  all  contained 


7 


Table  1:  Upper  percentiles  of  the  asymptotic  distributions  of  and  for  testing  Box-Cox 
transformations  when  |()9i  -f  1/A)/<7|  -♦  oo  and  n  —*  oo. 


Upper  Percentiles 

Statistics 

1/9 

0.50 

0.25 

0.20 

a 

0.15 

0.10 

0.05 

0.01 

0 

0.0508 

0.0739 

0.0812 

0.0915 

0.1036 

0.1260 

0.1787 

1/1000 

0.0508 

0.0738 

0.0810 

0.0905 

0.1031 

0.1258 

0.1785 

1/600 

0.0508 

0.0738 

0.0810 

0.0903 

0.1031 

0.1257 

0.1783 

1/400 

0.0507 

0.0737 

0.0809 

0.0902 

0.1030 

0.1256 

0.1781 

1/200 

0.0506 

0.0736 

0.0807 

0.0899 

0.1028 

0.1251 

0.1773 

Wi 

1/100 

0.0504 

0.0731 

0.0802 

0.0894 

0.1022 

0.1243 

0.1759 

1/60 

0.0501 

0.0726 

0.0796 

0.0887 

0.1011 

0.1231 

0.1740 

1/40 

0.0498 

0.0719 

0.0787 

0.0878 

0.1002 

0.1217 

0.1716 

1/20 

0.0487 

0.0700 

0.0766 

0.0851 

0.0970 

0.1175 

0.1649 

1/10 

0.0463 

0.0660 

0.0721 

0.0800 

0.0909 

0.1097 

0.1530 

1/6 

0.0428 

0.0608 

0.0663 

0.0736 

0.0836 

0.1007 

0.1406 

0 

0.3405 

0.4702 

0.5100 

0.5607 

0.6318 

0.7530 

1.0375 

1/1000 

0.3403 

0.4697 

0.5094 

0.5601 

0.6310 

0.7520 

1.0351 

1/600 

0.3400 

0.4693 

0.5090 

0.5596 

0.6304 

0.7512 

1.0339 

1/400 

0.3398 

0.4689 

0.5085 

0.5590 

0.6297 

0.7504 

1.0326 

1/200 

0.3392 

0.4677 

0.0571 

0.5574 

0.6277 

0.7476 

1.0281 

1/100 

0.3378 

0.4653 

0.5043 

0.5541 

0.6236 

0.7422 

1.0187 

1/60 

0.3359 

0.4620 

0.5005 

0.5496 

0.6182 

0.7351 

1.0007 

1/40 

0.3335 

0.4578 

0.4958 

0.5441 

0.6115 

0.7262 

0.9928 

1/20 

0.3262 

0.4454 

0.4817 

0.5277 

0.5918 

0.7004 

0.9518 

1/10 

0.3106 

0.4202 

0.4537 

0.4958 

0.5546 

0.6537 

0.8820 

1/6 

0.2871 

0.3880 

0.4186 

0.4575 

0.5117 

0.6035 

0.8168 

in  the  one  estimated  ^-arameter  g.  The  accuracy  of  the  above  tests  is  discussed  in  Sections  I 
and  5. 


3  Examples 

Three  examples  are  given  below  to  illustrate  the  use  of  Table  1.  These  examples  deal  with 
three  typical  situations  where  A  is  positive,  close  to  zero,  and  negative. 

Example  1.  Textile  Data.  Table  4  of  Box  and  Cox  (1964)  con  tuns  the  result  of  a  single 
replicate  of  a  3^  factorial  experiment.  The  response  y  is  the  cycles  to  failures  of  worsted  yam, 
and  the  three  explanatory  variables  assume  three  different  levels  each;  see  Box  and  Cox  (1964) 
for  details. 

Three  main  effect  linear  models  are  fitted  to  the  data.  The  first  model  uses  y  directly.  The 
second  model  transforms  y  according  to  (1.1)  and  the  third  model  uses  the  log  transformation 
since  the  estimate  of  A  is  very  close  to  0.  Parameter  estimates  are  obtained  by  directly  maxi¬ 
mizing  the  log-likelihood  function  and  by  applying  the  Box-Cox  transformation  procedure;  the 
results  are  practically  the  same,  and  a  and  A  are  given  in  Table  2.  The  much  smaller  values  of 

and  A}  show  that  the  transformed  models  are  much  better  fits  to  the  data. 

Table  2:  EPF  tests  of  fit  for  three  main  effect  linear  models,  textile  data.  Example  1. 


Model 

Parameter  Estimates 

Modified  EDF  (P-value) 

a 

A 

g 

y/2 

y- 

488.2 

— 

— 

1.3523  (<0.01) 

0.2364  (<0.01) 

y(A): 

0.126 

-0.059 

1042 

0.3372  (>0.50) 

0.0495  (>0.50) 

logy: 

0.186 

0 

981 

0.2480  (>0.50) 

0.0323  (>0.50) 

y(A): 

Minimum  of  —6 

=  82.577 

Example  2.  Tiree  Data  The  tree  data  in  the  Minitab  Student  Handbook  (Ryan,  Joiner 
and  Ryan,  1976,  page  278)  are  analyzed  here.  The  heights  (zi),  the  diameters  (Z2)  at  4.5  ft 
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above  ground  level  and  the  volumes  (y)  were  measured  for  a  sample  of  31  black  cherry  trees 
in  the  Allegheny  National  Forest,  Pennsylvania.  The  data  were  collected  to  determine  an  easy 
way  of  estimating  the  volume  of  a  tree  based  on  its  height  and  diameter. 

Again,  three  linear  models  are  fitted  to  the  data,  using  y,  j/(A)  and  5/(5),  where  5  is  chosen 
from  the  dimension  of  volume  versus  length.  Parameter  estimates  were  obtained  by  directly 
maximizing  the  log- likelihood  function  and  by  applying  the  Box-Cox  transformation  procedure; 
the  estimates  are  again  virtually  the  same.  Table  3  contains  the  results. 

Table  3:  EOF  tests  of  fit  for  three  straight  line  models,  tree  data.  Example  2. 


Model 

Parameter  Estimates 

Modified  EDF  (P- value) 

<T  A 

9 

1  y: 

3.882  — 

—  0.2482  (>0.50)  0.0361  (>  0.50) 

2  y(A): 

0.227  0.307 

2877  0.2925  (>0.50)  0.0450  (>0.50) 

3 «(!): 

0.249  i 

2894  0.2735  (>0.50)  0.0407  (>0.50) 

y(A): 

Minimum  of  Si 

=  29.22 

-E?i,log»(3.)«0 

In  this  example,  the  Box-Cox  estimate  A  =  0.307  is  close  to  the  estimate  1/3  derived 
from  dimensional  considerations.  Ail  of  the  three  models  pass  the  EDF  tests  easily,  with  the 
untransformed  data  giving  slightly  bettt.r  values  of  and  A^.  However,  a  close  look  at  residual 
plots  (Figure  1)  suggests  that  the  transformed  models  are  better  than  the  untransformed  one. 
It  appears  that  normality  is  sacrificed  a  little  in  order  to  obtain  overall  better  fits. 
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Figure  1:  Q-Q  plots  and  residual  plots  for  example  2 


UOMI  UOMI 


“^3  MoatJa  M«M3 


Q-Q  plots  and  plots  of  residuals  against  regressors  Zi  and  Z2  for  example  2.  Model  1  uses  y, 
model  2  uses  y(A),  and  model  3  uses  2/(5):  r,-  denotes  the  standardized  residuals. 
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Example  3.  Biological  Data.  In  Table  1  of  Box  and  Cox  (1964),  ihe  entries  are  the  sur¬ 
vival  times  (unit  is  10  hours)  of  animals  in  a  3  x  4  completely  randomized  fac.oriai  experiment. 
The  factors  are  Poison  Content  with  three  levels  and  Treatment  with  four  levels. 

Three  main  eflect  models  are  fitted  to  the  data  as  in  the  two  previous  examples;  the  third 
model  (with  A  =  —  1,  the  closest  integer  to  A)  is  included  for  comparison.  Table  4  clearly  shows 
that  the  power  transformation  improves  the  model  fit  considerably. 

Table  4:  EDF  tests  of  Rt  for  three  main  effect  linear  models,  biological  data,  Example  3. 


Model 

Parameter  Estimates 

Modified  EDF  (P- value) 

a  A 

g 

y- 

0.1582  — 

— 

1.0373  (<0.05)  0.1572  (<0.05) 

0.3916  -0.75 

62 

0.1974  (>0.50)  0.0281  (>0.50) 

0.4931  -1.00 

64 

0.2861  (>0.50)  0.0387  (>0.50) 

y(A): 

Minimum  of  —S 

=  3.667 

-nil  log $(-5.)  =  0.0005 
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4  Theory  of  the  Tests 


4.1  The  case  A  =  0 

In  order  to  obtain  and  use  the  asymptotic  distributions  of  and  a  key  step  is  to  show 
that  the  (estimated)  empirical  process 

K„(0  =  -  0  (4.1) 

converges  weakly  to  a  Gaussian  process  with  zero  mean  and  a  manageable  covariance  function. 
In  Theorem  4.1,  we  cover  the  situation  when  A  =  0.  The  first  part  of  the  theorem  gives  the 
asymptotic  distribution  of  0,  the  m.  1.  e.  of  0. 

Theorem  4.1  In  model  (L3),  suppose  that 

(A)  A'  =  (In  [/)  is  such  that  =  0,  where  is  an  n  x  I  vector  of  I’s, 

(B)  E  =  limn—oo  and  b  =  limn—oo  exist  for  any  /?  6  fl,  where  fl  is  an  open 

convex  subset  of  RF,  p  —  X0,  is  an  n  x  1  vector  with  its  i‘^  component  equal  to  (x|i0)*, 
*  =  2,4, 

(C)  A  =  linin-K9o  n~^X‘X  exists  and  is  positive  definite, 

(D)  there  are  constants  Mi  and  M2  such  that  for  any  n  and  any  i  =  l,...,n, 

—  max  |z,,|  <  Ml, 

~  max  |z,jl  <  M2, 

(E)  Cl  =  Iim„-.oo  n"‘  rjf,  cj  =  Iim„-.oo  n"‘  7?,  and  a  =  limn-oo  n"*  E?*!  »??"<  exist, 

where  rp  =  u\{02y  •  •  •  > V®"  w  the  i‘*  row  of  U, 


then 
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(1)  whtn  A  =  0,  the  maximum  likelihood  estimate  0  of  0  =  {/32, 03, . . . ,  PpY  is  asymptotically 
normal,  that  is, 

\/n{0  —  0)  —*d  ^(0,  r),  where 

'  A  B 
C 

\ 

with 


-1 


r- 


A  =  (4i/)-‘(7i/2  +  10i/)5‘A/?  +  6), 
B  =  i-D/2  -  E^/{2u),  -0,/u), 


C  = 


A/i/ 


0 


\  0‘  l/(2.,2)  j 

where  =  (1, 0, . . . ,  0)‘  is  a  p  y.  \  vector  with  its  first  component  equal  to  1  and  all  the 
other  components  equal  to  0; 


(2)  when  A  =  0,  the  (estimated)  empirical  process  Yn{t)  =  \/n{Fn{t)  —  t)  converges  weakly  to 
a  Gaussian  process  Y{t)  with  zero  mean  and  covariance  function 

p{s,t)  =  min(5,0  -  -  Ji(5)7i(0  7;Ms)J2it)  -  (4.2) 

where  Ji(()  =  J2{t)  =  Jzit)  =  -  l]Ji{t),  and  s,t  € 

[0, 1];  the  constant  g  is  given  by 


flf  =  6  +  8ci  +  cj  —  c?  -  a(  lim  n  *£/‘C/)"*a‘. 


(4.3) 


The  existence  of  the  limit  in  (4.3)  is  assured  by  assumption  (c)  above.  The  proof  of  Theorem 
4.1  is  given  in  the  Appendix. 

Comment.  Hinkley  (1975)  obtained  the  asymptotic  variance-covariance  matrix  of  y/n{d- 
0)  for  the  one-sample  problem  when  A  =  0  and  where  there  is  no  regression  involved,  so  that 
g  then  equals  6.  (There  is  a  misprint  in  his  derivation  because  the  asymptotic  covariance  of 
ft  and  0  should  be  2p{v  /i^)/3,  instead  of  2p{i/  -|-  p}).)  In  Theorem  4.1  we  have  given  the 
asymptotic  variance-covariance  matrix  for  the  more  general  case  of  linear  models. 
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4.2  The  case  X^Q 


For  A  ^  0,  the  integrals  involved  in  the  Fisher  information  matrix  are  not  tractable  (see  the 
Appendix),  so  the  asymptotic  variance-covariance  matrix  of  0  was  examined  numerically.  Il 
appears  that,  under  mild  conditions  on  the  model  matrix  X,  maximum  likelihood  estimates  of 
the  parameters  in  model  (1.3)  are  again  asymptotically  normal,  for  general  A  values,  and  have 
variance-covariance  matrices  with  the  usual  Fisher  structure.  Concerning  the  empirical  process 
Yn{t)  =  v/n(/’n(0  “*  we  conjecture  that  for  general  A,  cr  and  0,  this  can  be  approximated 
by  a  Gaussian  process  Yait)  with  zero  mean  and  covariance  function 

Pg{s,  t)  =  min(s,  t)-st-  »f^(s)rc\I'c(0»  0  <  s,  t  <  1,  (4.4) 

where  Fa  is  a  (p  -f  2)  x  (p  -f  2)  matrix  and  'i'c(t)  is  a  (p  -h  2)  x  1  column  vector  function  of  t. 
Both  Fc  and  ’l'c(0  depend  on  n.  Expressions  for  Fc  and  are  given  in  the  Appendix. 

The  next  theorem  shows  that  pG(s,t),  for  many  situations  occurring  in  practice,  can  be 
well  approximated  by  p(s,  t). 

Theorem  4.2  Let  be  the  quantity  given  in  (2.14)  but  using  the  true  parameter  values. 
If  (i)  X  -*  0,  or  (ii)  a  0,  or  (Hi)  /?i  -♦  +oo,  then  the  covariance  function  paisj)  of  (4.4) 
converges  (pointwise)  to  the  covariance  function  p(s,t)  of  (4.2)  with  g  in  (4-2)  replaced  by  gn. 
In  general,  this  pointwise  convergence  holds  if  |{^i  -f  l/A)/(r|  —♦00.  If  n  —*  00  is  added  as  d 
condition,  then  this  pointwise  convergence  holds  without  modification. 

The  proof  of  Theorem  4.2  is  given  in  the  Appendix. 
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5  Accuracy  of  the  tests 


In  this  section  we  discuss  the  accuracy  of  the  tests  given  in  Section  2.  These  tests  use  the 
points  given  in  Table  1,  which  are  asymptotic  points  for  distributions  corresponding  to  A  =  0. 
There  are  two  issues  to  consider:  (a)  the  accuracy  of  these  points  when  A  7^  0,  and  (b)  the 
accuracy  of  the  tests  in  practice,  when  the  asymptotic  points  are  used  with  finite  samples. 

Theorem  4.2  suggests  that,  even  when  A  7^  0,  it  will  often  be  the  case  that  pa{s,t)  can 
be  approximated  closely  by  p(s,t),  and  then  .Table  1  points  might  be  accurate  for  practical 
purposes.  This  was  first  studied  by  comparing  pa{syi)  to  />(s,t)  numerically.  The  results 
showed  that  PG(^,t)  ^  p(^,i)  when  L  of  (2.1)  can  be  well  approximated  by  Ibc  of  (2.5);  as  was 
discussed  in  Section  2,  this  is  frequently  assumed  to  be  the  case,  and  indeed  occurs  commonly 
in  practice. 

More  importantly,  to  assess  the  accuracy  of  the  tests,  the  exact  significance  levels  corre¬ 
sponding  to  points  in  Table  1  were  calculated  from  the  correct  asymptotic  distributions  using 
/><;(s,t),  for  a  range  of  parameter  values  A,  p  and  <7.  A  small  sample  of  results  is  given  in 
Table  5.  These  are  for  statistic  W^y  and  for  the  model  with  no  regression,  so  that  ^  =  6  and 
Pi  =  p.  The  upper  percentiles  of  the  asymptotic  distribution  of  W^y  taken  from  Table  1,  and 
their  significance  levels,  are  given  in  column  1.  Recall  that  these  are  calculated  using  p(syt). 
The  next  four  columns  give  the  values  of  A,  p  and  or,  and  the  significance  levels  when  the 
points  in  column  1  are  inserted  into  the  correct  asymptotic  distribution  using  pc;(s,t).  For 
these  examples,  S  =  (p-h  l/A)/o",  recall  that,  if  A  >  0  and  f  =  00,  or  if  A  <  0  and  S  =  —00, 
Table  1  would  be  exactly  correct.  It  is  clear  from  Table  5  that  Table  1  can  be  tised  to  give 
excellent  i^proximations  to  asymptotic  percentage  points  of  IV^,  even  when  S  is  far  from  its 
limit  00  or  —00.  Similar  results  hold  for  A^, 

The  next  question  which  arises  is  how  well  the  asymptotic  points  approximate  the  correct 
points  for  finite  sample  size  n.  In  many  goodness-of-fit  situations  it  has  been  verified  (see, 
for  example,  Stephens,  1986)  that  the  points  for  finite  n,  for  and  and  other  members 
of  the  Cramer- von  Mises  family,  converge  rapidly  to  the  asymptotic  points,  and  the  situation 
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Table  5:  A  comparison  of  asymptotic  significance  levels  for  various  A,  and  a  values  based  on 


Upper  Percentiles  (a  levels) 

A  =  0.5 

ti  =  lQ 

a  =  0.5 

{6  =  24) 

Parameter  Values 

A  =  0.4  A  =  0.6 

fi  =  0.0  fi  =  .5 

(T  =  0.8  <7  =  1.0 

(5  =  3.125)  (5  =  2.167) 

A  =  -0.5 

fi  =  —13 

a  =  1.0 

(-«  =  15) 

0.0428  (0.50) 

0.500 

0.528 

0.521 

0.500 

0.0663  (0.20) 

0.201 

0.198 

0.192 

0.201 

0.0836  (0.10) 

0.100 

0.098 

0.088 

0.100 

0.1007  (0.05) 

0.051 

0.048 

0.040 

0.050 

0.1406  (0.01) 

0.010 

0.009 

0.006 

0.010 

appears  to  be  the  same  for  the  present  problem.  Linnet  (1988)  empirically  studied  the  use  of 
the  Anderson-Darling  statistic  A’  and  the  Cramer-von  Mises  statistic  to  test  for  normality 
of  the  power-transformed  data  in  one-sample  problems.  Linnet  concluded  that  the  null  distri¬ 
butions  of  A?  and  depend  neither  on  the  transformation  parameter  A  nor  on  the  mean  n 
and  variance  v.  A  table  was  provided  for  A*  and  for  finite  samples  in  which  the  asymptotic 
critical  points  were  obtained  by  extrapolation. 

The  accuracy  of  the  asymptotic  points  in  Table  1  was  investigated  here  by  another  method, 
using  the  tree  data  of  Example  2  and  the  biological  data  of  Example  3  in  the  following  simula¬ 
tion  study.  Consider  the  tree  data.  The  value  of  is  0.0450  and  using  the  asymptotic  points 
the  P-value  is  0.590.  The  accuracy  of  this  P-value  was  examined  by  taking  the  estimates 
of  the  parameters  as  the  true  values  and  simulating  new  samples  based  on  this  model.  The 
Box-Cox  transformation  procedure  was  then  applied  to  each  simulated  sample  and  the  EDF 
statistics  calculated.  The  fraction  of  values  which  exceeded  0.0450  gives  the  empirically 
derived  P-value.  This  was  repeated  for  the  statistic  A^  and  the  whole  experiment  repeated 
again  for  the  biological  data.  Table  6  gives  a  comparison  between  the  P-values  of  the  data 


and  the  empirical  P-values.  It  can  be  seen  that  they  are  very  close,  and  give  evidence  that 
the  asymptotic  points  in  Table  1  can  be  used  safely  for  samples  of  reasonable  size  (we  suggest 
n  >  20). 

Table  6 

Example  Data  P— value  Empirical  P— value 


2 

P[W^  >  0.0450) 

0.5895 

0.5833 

2 

P(A*  >  0.2925) 

0.6282 

0.6100 

3 

P{W^  >  0.0281) 

0.8700 

0.8850 

3 

P{A^  >  0.1974) 

0.8848 

0.9117 

6  Summary 

In  summary,  tests  have  been  given  to  assess  the  normality  of  a  linear  model  fitted  to  values 
obtained  by  a  Box-Cox  transformation.  The  effect  of  estimating  several  parameters  is  contained 
in  one  critical  parameter  which  must  be  used  to  enter  Table  1.  The  points  given  in  this 
Table  are  correct  asymptotically  for  A  s  0,  but  will  also  give  very  good  approximations  to 
correct  points  in  most  circumstances  when  A  0.  They  are  also  good  approximations  to  the 
correct  points  for  finite  samples  of  reasonable  size. 

This  work  was  supported  by  the  Natural  Science  and  Engineering  Itesearch  Council  of 
Canada,  and  the  authors  express  their  thanks  for  this  support. 


Appendix 


A  Proof  of  Theorem  4.1 


Proof  of  (1).  Let  fi  =  =  X/S.  When  A  =  0,  W;  =  5^(0)  =  logJ^  ~  N{fii,v). 

Denote  dYi{X)/d\  by  y;(A)  and  <PYiiX)ldX^  by  y:(A),  then  y;(0)  =  W?/2,  y;(0)  =  W?/3. 
Strdghtforward  calculations  show  that  the  inverse  of  the  Fisher  information  matrix  for  0  = 


(0,^‘,  v)‘  is  given  by 


where 

n-^Bn  =  i-D/2  - 


n“*Cn 


{n-'X*X)lv  0  \ 
0‘  l/(2i/*)  )  ’ 


where  i?*  s  (1, 0, . . . ,  0)'  is  a  p  x  1  vector  with  its  first  component  equal  to  1  and  all  the  other 
components  equal  to  0.  Therefore,  as  n  — >  oo,  n~^r„  -»  F  as  desired. 


Proof  of  (2).  The  proof  is  based  on  Loynes  (1980).  In  the  present  case,  the  null  hypothesis 
Hn{j)  in  Loynes  (1980)  specifies  nothing  and  all  the  parameters  0  —  {X,0*,i/y  are  to  be 
estimated.  Since  yi(0)  =:  logFi  N{iii,v),  all  of  the  expectations  needed  to  form  the  Fisher 
information  matrix  can  be  found  exactly.  However,  to  make  the  proof  more  readable,  it  is 
assumed  that  0  ssQ  and  </  =  1.  Then  the  inverse  of  the  asymptotic  variance-covariance  matrix 
for  0  =  (0,0*,  1)*  is  found  to  be 

-  -  0  0 

3  3  V  w 

A  -  0  0 

0  0‘  G-^  0 

0  0  0  2 
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where  G  ^  —  (lir”* n  ^U^U)  X  —  (In  U).  It  is  reaxlily  checked  that  assumptions  A1 
and  A2  of  Loynes  (1980)  are  satisfied  naturally  by  model  (1.3).  Assuming  assumption  (D),  it 
can  be  checked  that  assumptions  A4  and  A5  of  Loynes  are  also  satisfied;  Assumptions  A7  and 
A9(6)  of  Loynes  can  be  checked  by  direct  calculations.  Therefore,  by  Theorem  1  of  Loynes 
(1980),  the  estimated  empirical  process  ^(f)  of  (4.1)  converges  weakly  to  a  Gaussian  process 
y{t)-  By  Corollary  1  of  Loynes  (1980),  the  mean  of  Y{t)  is  zero  and  the  covariance  function 
of  Y{t)  is 

p(s,0  =  min(s,t)  -st-  ’i'‘(s)r'^(0, 

where  9(t)  is  found  to  be 

«(i)  =  (-5(*-(0)V.((). /.(OB’,  5^,(1))'.  (^.*€(0.1]). 

Direct  computations  then  show  that  the  expression  given  in  (4.2)  follows  for  the  case  where 
g  =  6.  □ 


B  Expressions  for  Tq  and 


The  (p+2)  X  (p  +  2)  matrix  Fc  and  the  (p  +  2)  x  1  function  studied  in  Theorem  4.2 
are  given  below. 

Let  In  be  the  Fisher  information  matrix  for  a  random  sample  Yi,...,Yn  from  model  (1.3). 
Then 

and  for  A  >  0,  /n  has  the  following  components: 

\  _  -f/*/  MW)(i.-2Av/?)  +  ^»(f.)'| 

/  Si""  -mA)  /■ 

‘{-m  ■ 

‘{S\  ■  -jaiTO  -  Xfttim  -  £  »|| 


r  1 

A  f 

\  dxdxj 

=  r{ 

ml  t 
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\  apdp^j  U  1/  v^^{6i)  ) 


( 


d^L  ] 
'  d Pd  V j 


=  XUiag 


(  m) 

\Uy/i/^(Si) 


- 1) + Si4>HSi) 


2uy/U<tf^{6i) 


-)  In, 


f  d^L  \  _  n  A  f  Si,f>{6i)  Si4>{Si)^Si){Sf  -  3)  +  Sf^^^Sj)  ] 

\  dudu)  ~  2u^  j’ 

where  diag{di)  denotes  n  x  n  diagonal  matrices  with  d,-  as  the  (t,  element,  1„  denotes  an 
n  X  1  colunm  vector  of  I’s.  The  expressions  for  the  case  where  A  <  0  can  be  obtained  by 
replacing  Si  by  — tf,-,  except  for  E{—d^L/dPdv},  where  the  whole  expression  also  needs  to  be 
multiplied  by  —1.  The  Jus  are  given  by,  for  A  >  0, 


Ju  =  i;{v;^(A)  +  (y^(A)  - /i0y;(A)} 

=  (A'*$(^,))"*  f  ^(u)[{(l  +  A/ij  +  Aow)  log(l  +  Xfii  +  Ao-u)  -  A/i,  -  Atrw}^ 
+Aov(l  +  A/if  +  Xav)  log^(l  +  A/if  +  Xav) 

— 2Aot;{(l  +  A/if  +  A<rw)  log(l  +  A/if  +  Aau)  —  A/if  —  Aau}]  dw,  (B-2) 

where  m  =  xj/?,  Si  =  (/if  +  l/A)/v^,  yf(A)  and  yf(A)  are  the  first  and  second  derivatives  of 
yf(A)  with  respect  to  A,  respectively;  for  the  case  A  <  0,  the  above  integrals  should  be  done 
for  the  range  — oo  to  —S{  and  ♦(5f)  should  be  replaced  by  $(— 6f). 

Similarly,  jE?{y(A)}  has  components  d^f  given  by,  for  A  >  0, 


=  -E{y<(A)} 

^(w){(l  +  A/if  +  Ao-w)  log(l  +  A/if  +  A<7u)  -  A/if  -  Atrv]  du,  (B.3) 

■fi 

and  £{(y’(A)  —  Xpyy(X)}  has  components  Jsf  given  by,  for  A  >  0, 

J3i  -  Emx)  -  pi)yi(A)} 

=  (A’#(tff))"^  <7^(w)v((l  +  Apf  +  Aou)  log(l  +  A/if  +  A<rv)  -  A/if  -  Aaw]  du.  (B.4) 

J—Si 

In  the  case  where  A  <  0,  the  above  two  integrals  should  be  done  for  the  range  -oo  to  — ^f  and 
♦(tff)  should  be  replaced  by  ♦(-^f). 
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For  function  ’l'c(0»  there  is 


»ow=- 


"  Ei 


where  J  is  a  (p  +  2)  x  1  column  vector  function  with  the  following  components,  where 
y  s  1, . . . ,p  corresponds  to  the  components  associated  with 


» 

*r(“'(i)  =  . 

k 


+  Aatn.)  log(l  +  Ap, 

+A<Ttni)  -  A/i<  -  A<rto.}  +  ^(6,)$(ti»0  -  ^(5.)1,  if  A  >  0, 

-(<7A’$’(-^,))"*[^(v.)*(-^f){(l  +  A/1,-  +  A<rw,-)log(l  +  A/i, 

+A<ru,-)  -  A/i,-  -  A<n;,-}  -  if  A  <  0, 

{xy/(a$2(5.-))}{<^(tn.-)$(5.-)  +  ^(5.-)$(u;,)  -  ^(5.-)},  if  A  >  0,  ^ 

if  A  <  0, 

{2a^9^Si))-^  {<f>{wi)wii{Si)  -  6i<l>{Si)i{wi)  +  if  A  >  0, 

+  6i,f,(-Si)i!f(v{)},  if  A  <  0, 


(B.5) 


where  w,-  ss  sb  $~'(1  +  -  1)),  v,-  =  «,-(<)  =  ^-“<1  *  €  [0, 1). 

C  Proof  of  Theorem  4.2 


Define  6  =  {fii  +  l/A)/<7  and  rji  =  u‘(jS2,...,/?p)V‘^»  where  u|  is  the  row  of  U.  Then  as 
^  — k  oo,  ^{Si)  —*  0  and  $(6i)  —*  1.  Now  let  5  — »  oo,  we  have  an  expansion  for  the  function 
’®c(0  given  by 

«c(0  =  -{Ji(t)/<r}(e„(0,I>.«-^(0/(2«r))‘,  (C.6) 

where  D  *  (1, 0, . . . ,  0)  is  a  1  x  p  vector  and  e„(0  is  given  by 

c^{t)  =  ^  +  j{Slog{\<7S)-S  +  \oz{X<T6)9-'{t) 


”  1  A, 


(C.7) 
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Similarly,  let  ^  ♦  oo,  we  can  have  an  expansion  for  Fc  given  by 


-1 


On  ^ 

6„  iX*X)/n  0 

0‘  l/(2a^)  ) 


\  ^ 


(C.8) 


where  6n  is  the  result  of  replacing  taking  expectations;  Cn  is  the 

result  of  multiplying  (C.7)  by  ^“'(Oi  replacing  $“*(0  taking  expectations;  and 

On  can  be  obtained  in  a  similar  but  more  involved  fashion.  The  key  thing  is  that  with  the 
above  expansions,  'i'G(s)rc4c(0  turns  out  to  depend  on  6  and  the  17,- ’s,  and  not  to  depend  on 
A,  a  and  fit.  Using  elementary  row  and  column  reductions  to  simplify  into  an 

expression  in  terms  of  en(^)i  en(0i  ^nt  uid  Cn,  and  substituting  the  above  expansions  into 
this  expression  will  lead  to  the  desired  result.  Details  of  the  algebra  are  available  from  the 
authors.  □ 


D  Calculation  of  Percentage  Points 


The  asymptotic  distribution  of  is  determined  by  the  eigenvalues  of  /)(s,t)  and  the 
asymptotic  distribution  of  is  determined  by  the  eigenvalues  of  p(s,  f)/ {-sCl  “  5)4(1  - 
In  general,  let  the  covariance  function  of  a  integral  type  statistic  T  be  k(s,  4).  Then  the  limiting 
distribution  of  T  has  the  form  ^iXh  where  the  xl  nre  independent  chi-square  random 
variables  on  1  degree  of  freedom,  and  the  Aj's  are  the  eigenvalues  of  the  integral  equation 

A/(5)  =  /k(5,4)/(4)d4. 

^0 


For  the  general  theory  behind  the  above  statements,  see  Durbin  (1973).  In  this  paper,  the 
above  equation  is  discretized  into 


A/i 


x-0.5  i-0.5 


m 


m 


)/i)  (i  =  l,...,m) 


for  a  large  int^er  m  (m  s  100  is  used  in  this  paper)  and  can  be  solved  for  eigenvalues  A, 
(t  s  1, . . . ,  m).  Then  the  distribution  of  AjX?  can  be  approximated  by  A,x?  +  ^Xm+i » 


where  Xm+i  is  a  chi-square  random  variable  on  1  degree  of  freedom  and  independent  of  the  x? 
(t  =  1, ...  I  m),  and  r  is  found  by  making 

/  =  2  •^.  =  (5Z  ^.)  +  r 

ial  isl 

true.  For  example,  for  =  0.0492385,  53^1  A,-  =  0.0492413  so  t  =  — 2.8e-06.  Finally, 

the  percentage  points  are  found  using  the  numerical  Fourier  inversion  method  of  Imhof  (1961). 
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EDF  Tests  for  Normality  in  Linear  Models 
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Summary 

The  Box'Cox  tr&osfonnation  procedure  has  been  iised  extensively  in  data  analysis,  for  exam* 
pie  in  regression,  where  the  response  variable  is  subjected  to  a  suitable  power  transformation 
so  that  the  standard  normal-theory  linear  regression  models  can  be  fitted  to  the  transformed 
values.  In  this  paper,  distribution  theory  is  developed  for  a  family  of  EDF  statistics,  includ¬ 
ing  the  Anderson-Darling  statistic  and  the  Cramer- von  Mises  statistic  so  that  these 
statistics  can  be  used  to  test  for  normality  in  the  linear  model  after  applying  the  Box-Cox 
transformation.  A  table  of  asymptotic  critical  points  is  given  for  and  W^,  and  numerical 
examples  are  given  to  illustrate  the  use  of  the  table. 


